Skip to content

[BWARE] Speed up frame-to-matrix conversion and harden number parsing#2480

Merged
Baunsgaard merged 3 commits into
apache:mainfrom
Baunsgaard:split/doubleParser
Jun 9, 2026
Merged

[BWARE] Speed up frame-to-matrix conversion and harden number parsing#2480
Baunsgaard merged 3 commits into
apache:mainfrom
Baunsgaard:split/doubleParser

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

Tightens the hot path that converts FrameBlocks of arbitrary schema into MatrixBlocks, with a defensive fallback for malformed cells.

  • DoubleParser.parseFloatingPointLiteral: replace the >= 'I' / >= 'a' character guards with a single 0-9 range check on the last char. The previous guards over-matched and pushed too many strings into the slow Double.parseDouble path
  • DoubleArray.parseDouble: stop wrapping the parse failure as a DMLRuntimeException so callers can distinguish format errors
  • MatrixBlockFromFrame:
    • turn the interface into a class with a private constructor so Jacoco can measure it cleanly
    • on NumberFormatException / DMLRuntimeException during a bulk block convert, log once and fall back to convertSafeCast which writes NaN per offending cell instead of failing the whole job
    • add convertSafeCast / convertBlockSafeCast helpers

Tightens the hot path that converts FrameBlocks of arbitrary schema
into MatrixBlocks, with a defensive fallback for malformed cells.

- DoubleParser.parseFloatingPointLiteral: replace the >= 'I' / >= 'a'
  character guards with a single 0-9 range check on the last char.
  The previous guards over-matched and pushed too many strings into
  the slow Double.parseDouble path
- DoubleArray.parseDouble: stop wrapping the parse failure as a
  DMLRuntimeException so callers can distinguish format errors
- MatrixBlockFromFrame:
    - turn the interface into a class with a private constructor so
      Jacoco can measure it cleanly
    - on NumberFormatException / DMLRuntimeException during a bulk
      block convert, log once and fall back to convertSafeCast which
      writes NaN per offending cell instead of failing the whole job
    - add convertSafeCast / convertBlockSafeCast helpers
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.39%. Comparing base (88c26e2) to head (6ce2d8c).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2480      +/-   ##
============================================
+ Coverage     71.37%   71.39%   +0.01%     
- Complexity    48749    48777      +28     
============================================
  Files          1571     1571              
  Lines        188912   188942      +30     
  Branches      37067    37071       +4     
============================================
+ Hits         134845   134898      +53     
+ Misses        43601    43590      -11     
+ Partials      10466    10454      -12     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Frame-to-matrix conversion silently fell back to writing NaN for cells
that cannot be cast to double, logging only a single warning. This
changed behavior for callers that previously failed fast on incompatible
data. Make the lenient behavior opt-in.

- Add sysds.frame.tomatrix.warncast (default false): when false the
  conversion fails fast on number format errors; when true it warns once
  and writes NaN for the incompatible cells
- Read the flag once on the calling thread and pass it down, since the
  thread-local config is not visible to pool workers
- Extract convertStrict to share the contiguous/generic dispatch between
  the strict and warn-only paths
- Add tests for the warn-only fallback, the strict fail-fast default, and
  the tightened DoubleParser trailing-character guard
- Cover the warn-cast success path where a fully valid frame converts
  without triggering the NaN fallback
- Cover the zero-value branch of the safe-cast non-zero count and an
  all-invalid frame becoming all NaN
- Add parallel fail-fast coverage for the strict (default) path
- Assert DoubleArray.parseDouble surfaces the raw NumberFormatException
  instead of a wrapped DMLRuntimeException
@Baunsgaard Baunsgaard merged commit 17803b1 into apache:main Jun 9, 2026
85 of 87 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant